Exercise 21 B 



1. For each set of data below calculate the standard deviation: 

(a) (i) 19.0,23.4,36.2,18.7,15.7 
(ii) 0.4,-1.3,7.9,8.4,-9.4 

(b) (i) 28,31,54,28,17,30 
(ii) 60,18,42,113,95,23 




2. For each set of data below calculate the standard deviation: 
(a) 1,1,2,3,5 (b) 3,-2,4,-2,5,2 



The ordered set of data 5, 5, 7, 8, 9, x, 13 has interquartile 
range equal to 7. 

(a) Find the value of x. 

(b) Find the standard deviation of the data set. [5 marks] 

Consider the five numbers, 2, 5, 9, x and j/. The mean of the 

numbers is 5 and the variance is 6. Find the value 

ofxy. [7 marks] 

In five tests Suewan has an average of 23 marks and a 

standard deviation of 4 marks. In her sixth test she 

scores 32. What is the overall standard deviation 

of her marks? [7 marks] 

The mean of a set of 15 data items is 600 and the standard 
deviation is 12. Another piece of data is discovered and 
the new mean is 600.25. What is the new standard 
deviation? [ 7 marks] 



If the sum of 20 pieces of data is 1542 find the smallest 
possible value of ^\ f 2 . [4 marks] 



g (a) Explain why for any set of data x { -x is no greater than 
the range. 



(b) By considering the formula s n = J^T 



(*«-*) 



prove that the standard deviation is always less than or 
equal to the range. [4 marks] 
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Frequency tables and grouped data 

It is common to summarise a large quantity of data in a frequency 
distribution table. This is a list of all the values the data takes, along 
with how often they occur. We could convert this into a list of all 
the data values and calculate the statistics as we had before, but it is 
enough to just imagine writing out a list, 16 ones, two twos, etc. 



Worked example 21.3 



Find the mean number of passengers observed in cars as they passed a school. 



Passengers 


Frequency 


0 


32 


1 


16 


2 


2 


3 or more 


0 



(none in the first group, 1 6 in the second group 2, ## 
and 4 in the third group) 



total number of paeeengere = 
(32 x 0) + (16 x 1) + (2 x 2) + 0 = 20 
20 



mean = 



This method suggests an important formula, 

KEY POINT 21.3 

Finding the mean from a frequency table: 



x = 



where j { is the frequency of the /th data value and 
n = is the total number of data items. 



We can work out x 2 in a similar way, which gives the following 
formula for standard deviation. 

KEY POINT 21.4 




Standard deviation from a frequency table: 



s 2 = x 2 

n 

n 
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In Worked example 21.3, we knew the exact data values, but 
when we are dealing with grouped data, we do not have this 
level of precision. In order to work out the mean and standard 
deviation, our best and simplest assumption is that all the original 
values in a particular group are located at the centre of the group, 
called the mid-interval value. To find the centre of the group we 
take the mean of the largest and the smallest possible values in the 
group, called the upper and lower interval boundaries. 



Worked example 21.4 



Find the mean and standard deviation of the weight of eggs produced by a chicken farm. 
Explain why these answers are only estimates. 



Weight of eggs, in g 


Frequency 


[100, 120[ 


26 


[120, 140[ 


52 


[140, 160[ 


84 


[160, 180[ 


60 


[180, 200[ 


12 



Make a table using the* 
mid-interval value for 
each group 



Apply the formulae* 



Whenever you find a 
me an or a standard 
deviation it .» aW*J* 
WO rth checking hat 

he data, an avenge 
of about 1 50 g here 

seems reasonably 



X, 


f, 


x,f, 


xff, 


110 


26 


2&60 


314600 


130 


52 


6760 


&7&&00 


150 


54 


12600 


\&90000 


170 


60 


10200 


1734000 


190 


12 


22&0 


433200 


Sum: 


234 


34700 


5250600 



Yxfi 34700 
n 234 



1,^ 



5250600 



-145.3 2 = 445.4 



n 234 
Therefore e n = 21.2g (35F) 

These answers are only estimates because we have 
assumed that a\\ the values in each group are at the 
centre, rather than using the actual data. 
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Sometimes the endpoints of the intervals shown in the table 
are not the actual smallest and largest possible values in that 
group. For example, when measuring length in centimetres it 
is common to round the values to the nearest integer, so 10-15 
actually means [9.5, 15.5]. To find the mid-interval values we 
must first identify the actual interval boundaries. 



Worked example 21.5 



Estimate the mean of this data: 



Carefully decide on the ## 
upper and lower interval 
boundaries. There should 
be no 'gaps' between 
the groups, because age 
is continuous data. You 
are 1 2 years old until 
your 1 3th birthday 



Age Frequency 



10 to 12 


27 


13 to 15 


44 


16 to 19 


29 



Group 


X, 


f, 


x,f, 


[10, 13[ 


11.5 


27 


310.5 


[13, 16[ 


14.5 


44 


635 


[16, 20[ 


10 


29 


522 


Sum: 




100 


1470.5 



£^ = 14705 
n 100 K ' 




Exercise 21 C 
1. Calculate the mean and standard deviation of each data set: 
(a) 



X 


Frequency 


(b) 


X 


Frequency 


0 


16 




-1 


10 


i 


22 




0 


8 


2 


8 




l 


5 


3 


4 




2 


1 


4 


0 




3 


1 
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2. Calculate the mean and standard deviation for each 
data set: 

(a) 



X 


Frequency 


(b) 


X 


Frequency 


10 


7 




0.1 


16 


12 


19 




0.2 


15 


14 


2 




0.3 


12 


16 


0 




0.4 


9 


18 


2 




0.5 


8 



A group is described as £ 17 - 20'. State the upper and lower 
boundaries of this group if it is measuring: 

(a) age in completed years 

(b) number of pencils 

(c) length of a worm to the nearest centimetre 

(d) hourly earnings, rounded up to whole dollars. 



4. Find the mean and standard deviation of each of the 
following sets of data: 

(a) (i) x is the time taken to complete a puzzle in seconds 



X 


Frequency 


[0,1 5[ 


19 


[15,30[ 


15 


[30,45[ 


7 


[45,60[ 


5 


[60,90[ 


4 



(ii) x is the weight of plants in grams 



X 


Frequency 


[50,100[ 


17 


[100,200[ 


23 


[200,300[ 


42 


[300,500[ 


21 


[500,1000[ 


5 
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(b) (i) x is the length of fossils found in a geological dig, to 
the nearest centimetre 



X 


Frequency 


0to4 


71 


5 to 10 


43 


11 to 15 


22 


16 to 30 


6 



(ii) x is the power consumption of light bulbs, to the 
nearest watt 



X 


Frequency 


90 to 95 


17 


96 to 100 


23 


101 to 105 


42 


106 to 110 


21 


111 to 120 


5 



(c) (i) x is the age of children in a hospital ward 



X 


Frequency 


0to2 


12 


3 to 5 


15 


6 to 10 


7 


11 to 16 


6 


17 to 18 


3 



(ii) x is the amount of tips paid in a restaurant, rounded 
down to the nearest dollar 



X 


Frequency 


Oto 5 


17 


6 to 10 


29 


11 to 20 


44 


21 to 30 


16 


31 to 50 


8 
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In a sample of 50 boxes of eggs, the number of broken eggs 
per box is shown below: 



Number of 
















broken eggs 


0 


1 


2 


3 


4 


5 


6 












Number of boxes 


17 


8 


7 


7 


6 


5 


0 



(a) Calculate the median number of broken eggs per box. 

(b) Calculate the mean number of broken eggs per box. [4 marks] 

J The mean of the data in the table is 32 and the variance is 
136. Find the possible values ofp and q. 



Frequency 



20 


12 


40 


1 


P 


8 



[8 marks] 



Summary 

• Most of statistics is based on trying to infer properties of a population based upon a sample 
from that population. 

• To get a representative sample of the population, it is good practice to collect a random 
sample, where each member of the population is equally likely to be selected for the sample 
and the probability of selecting a member of the population is independent. 

• An outlier is a correct but unusual data value. 

• An anomaly is an unusual data value caused by a measurement error. 

• Discrete data takes only a predefined value (it does not have to be an integer!). 

• Continuous data can take any value in a given range. This type of data is generally grouped 
before we can work with it. 

• Standard deviation (sj is a measure of how spread out the data is relative to the data's mean, 
and it takes into account all of the data. 

• The square of the standard deviation is called the variance and it has the formula: 



— \2 



( x i- x ) 



or more commonly: s 2 n - x 1 - x 2 



Large datasets are summarised in frequency distribution tables and the mean and standard 
deviations can be calculated from these tables using the formulae: 



x = 



where f { is the frequency of the fth data value and n = / as the total number of data 
items, *' 
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and: 



— 2 



When investigating grouped data we must assume that every element has the mid-interval 
value of the group (the mean of the upper and lower boundaries). 

The methods for calculating statistics for grouped data vary slightly depending upon whether 
the data is discrete or continuous. 

Introductory problem revisited 

The magnetic dipole of an electron is measured in a very sensitive experiment 3 times. 
The values are 2.000000 15, 2.000000 12 and 2.0000009. Does this support the theory 
that the magnetic dipole is 2? 

The average magnetic dipole is 2.000000 12 which is pretty close to 2, but the standard 
deviation in the measurements is 0.000000 245, and so the mean is approximately 5 sample 
standard deviations away from 2. Within the natural variation observed, the magnetic dipole 
cannot be said to be 2. 



The difference between 2.000 001 2 and 2 might seem trivial, but it was this 
difference which inspired Richard Feynman to create a new theory of physics called 
Quantum electrodynamics, which did indeed predict this tiny difference from 2! This is 

an example of theory driving experiment which in turn creates new theory - the interplay 

between theoretical mathematics and reality. 




1 



y 
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Mixed examination practice 21 



Short questions 

A student takes the bus to school every morning. She records the length of 
the time, in minutes, she waits for the bus on 12 randomly chosen days. The 
data is summarised by: 



^ =49 and ][% 2 = 305.7 



i=l 



i=l 



Calculate: 

(a) the mean time she spends waiting for the bus 

(b) the standard deviation of the times. 



[5 marks] 



The average wavelength of light in nanometres emitted by a glowing wire is 
measured on 50 different occasions and the results are given below: 



Wavelength in nm (k) 


Frequency 


600-640 


22 


640-680 


18 


680-720 


X 


720-760 


y 



The mean of X is calculated from this table as 653.6. 

(a) Find the values of x andy. 

(b) Calculate an estimate of the variance. 

(c) Explain why this is only an estimate. 



[9 marks] 



An experiment was conducted on the reaction times of 15 students (t) in 
seconds. The results were that the average reaction time was 0.2 s and the 
variance was 0.0025 s 2 . A 16th student is observed later. She has a reaction 
time of 0.16 s. Find the new mean and standard deviation. [8 marks] 

The variance of two data items is k. Find an expression in terms of k for the 
range. [5 marks] 
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Long questions 

j The following is the cumulative frequency diagram for the heights of 
30 plants, given in centimetres. 




5 10 15 20 25 
height (h) 

(a) Use the diagram to estimate the median height. 

(b) Complete the following frequency table: 



Height (h) 


Frequency 


0<h<5 


4 


5<h<\0 


9 


10</i<15 




15</z<20 




20</z<25 





(c) Hence estimate the mean height. [8 marks] 

(© IB Organization 2006) 

2. The following histogram shows the length of the arms of 37 children in a 
classroom in Lithuania, given to the nearest cm. 

20- 



15 
I 10 



40 50 60 70 80 
Length of arm (cm) 



90 
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(a) Explain why the bar representing the first group goes below 40. 

(b) Complete the following frequency distribution: 



Length 


Frequency 


40-49 


12 


50-59 




60-69 




70-79 




80-89 





(c) Use this data to estimate the mean and the standard deviation of the data. 

(d) Give one reason to explain why the average arm length of all children in 
Lithuania might be different from the value found above. 

3. The frequency distribution of the number of cars in households on a street is 
given below: 



Number of cars 


Frequency 


i 


a 


2 


b 



(a) Find an expression for the mean of the number of cars and show that the 

! ab 
variance is given by j. 

{a + b) 

(b) Prove that it is impossible for the mean to equal the variance. 

(c) If the number of households with one car is three times larger than the 
number of households with two cars find the mean and the standard 
deviation in the number of cars. 
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continued . . . 

Use the fact that the probabilities * # 
add up to 1 

We can now calculate all the* # 
probabilities 
We are not asked for exact values, 
so round them to 3SF 



0.7 k + 1.2k + 1.44k + 034k = 1 
.-. k = 0.239 



X 


1 


2 


3 


4 


P(X = x) 


0.167 


0.2&7 


0.344 


0.201 



One of the most obvious questions to ask about a random 
variable is what value it is most likely to have. This value is 
called the mode. The random variable X in the above example 
has mode 3; the most likely number of chocolates you will win 
is three. A random variable may not have a mode (for example, 
the outcomes of a fair die are all equally likely) or it may have 
more than one mode. In particular, if the largest probability 
corresponds to two of the outcomes, the random variable is 
called bimodal. 

Another question we could ask is, if we were to play the above 
game many times, on average how many chocolates would we 
expect to win? The answer is not necessarily the same as the 
most likely outcome. We will see how to answer this question in 
the next section. 



Exercise 23A 



1. For each of the following, draw out a table to represent the 
probability distribution of the random variable described: 

(a) A fair coin is thrown four times. The random variable 
W is the number of tails obtained. 

(b) Two fair dice are thrown. The random variable D is the 
difference between the larger and the smaller score, or 
zero if they are the same. 

(c) A fair die is thrown once. The random variable X is 
calculated as half the result if the die shows an even 
number, or one higher than the result if the die shows 
an odd number. 

(d) A bag contains six red and three green counters. Two 
counters are drawn at random from the bag without 
replacement. G is the number of green counters 
remaining in the bag. 



In this exercise you 
will need to use ideas 
from chapter 22, 
particularly tree 
diagrams. For <^ 

Question 2(c) you 
may want to look 
at chapter 7 on 
Geometric sequences. 
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(e) Karl picks a card at random from a standard pack of 
52 cards. If he draws a diamond, he stops; otherwise, 
he replaces the card and continues to draw cards at 
random, with replacement, until he has either drawn a 
diamond or has drawn a total of 4 cards. The random 
variable C is the total number of cards drawn. 

(f) Two fair four-sided spinners, each labelled 1, 2, 3 and 4, 
are spun. The random variable X is the product of the 
two values shown. 

2. Find the missing value k: 

(a) (i) 



X 


3 


7 


9 


11 


P(X = x) 


1 

2 


1 

4 


1 

8 


k 


(ii) 


X 


5 


6 


7 


10 


P(X = x) 


0.2 


0.3 


k 


0.5 



(b) (i) p(Y = y) = ky fory= 1,2,3,4 

t- 

(ii) P(X = x) = - forjv= 1,2,3,4 
x 

(c) (i) P(X = x) = fc(0.l) x for xeN 

(ii) P(R = r) = k(0.9) r for y e N 

In a game a player rolls a biased four-sided die. The 
probability of each possible score is shown below. 



Score 


i 


2 


3 


4 


Probability 


i 


1 


k 


1 




3 


4 




5 



Find the probability that the total score is four after 

two rolls. [5 marks] 



746 Topic 5: Probability and statistics 



© Cambridge University Press 201 2 



Expectation, median and variance 
of a discrete random variable 

The expectation of a random variable is a value which 
represents the mean result if the variable were to be repeatedly 
measured an infinite number of times. It is a representation of 
the average value of the random variable. 

KEY POINT 23.2 



The expected value of a discrete random variable X is 
written E(X) and calculated as: 

E(X) = ^xP(X = x) 




Worked example 23.3 



The random variable X has probability distribution as shown in the table below. Calculate E(X). 



X 


1 


2 


3 


4 


5 


6 


P(X = x) 


1 

10 


1 

4 


1 

10 


1 

4 


1 

5 


1 

10 



Apply the formula 



111111 
E(X)=1x — + 2x — + 3x- + 4x — + 5x- + 6x — 
w 10 4 10 4 5 10 




Just as the mean of a set of integers could be fractional, so the 
expectation of a random variable need not be a value which the 
variable can itself take. 

To find the median of a discrete random variable we use the 
defining property of the median - that half of the data should 
fall below it. In the context of a probability distribution this 
means that: 

Median, m, is the smallest value of X for which P(x < m) is 

more than — . If there is a value m such that P(X < m) = \ then 

2 v 2 

the median is the mean of this value and the next largest 
value of X. 

Probabilities of the form P(X < x) which give the probability of 
being less than or equal to a certain value are called cumulative 
probabilities. 
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Worked example 23.4 



Find the median of the probability distribution below: 



P(X = x) 



1 


3 


6 


8 


0.2 


0.4 


0.3 


0.1 



To find the median evaluate •* 

the probability of being 
below each value until you 
get above 0.5 



F(X<]) = 0.2 

F(X<3) = 0.6 

Therefore the median \e 3. 



In the above example if the distribution had been 



Standard deviation 
is a much more 
meaningful 
representation of the 
spread of the variable. So 
why do we bother with 
variance at all? The answer 
is purely to do with 
mathematical elegance. If 
you do the statistics option 
(Topic option 7) you will see 
that the algebra of variance 
is far neater than the 
algebra of standard 
deviations. 



This is the same idea 
as the variance of 
the set of data from 
Section 21B. 



P(X = x) 



1 


3 


6 


8 


0.2 


0.3 


0.4 


0.1 



then P(X < 3) is exactly 0.5. The median is the average of 3 and 
6, so it is 4.5. 

As well as knowing the expectation and median, we may also 
be interested in how far away from the average we can expect 
an outcome to be. The variance of a random variable is a value 
representing the degree of variation that would be seen if the 
variable were to be repeatedly measured an infinite number of 
times. It is related to how spread out the variable is. 

KEY POINT 23.3 

The variance of a random variable X is written Var (X) and 
is calculated as 



Var(X) = E(X 2 )-[E(x)] 2 
whereE(X 2 ) = JVP(X = x) 



This formula is often quoted as 'the mean of the squares minus 
the square of the mean. 

The formula booklet also shows the alternative formula, 
E(X - |Ll) 2 , but this is hardly ever used. 
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Worked example 23.5 



Calculate Var(X) for the probability distribution in Worked example 23.3. 



Find the expectation •* 

Apply the values from the ## 
distribution 



From above, E(X) = 3.5 



E(X2) = 1 2 X ± + 2 2 X - + 3 2 x — + 4 2 x- + 5 2 x- + 6 2 x — 
10 4 10 4 5 10 

= 14.6 

Var(x) = E(X 2 )-[E(x)J 

= 14.6-12.25 = 2.35 



Exercise 23B 

1. Calculate the expectation, median and variance of each of the 
following random variables: 



(a) (i) 


X 


1 


2 


3 


4 




P(X = x) 


0.4 


0.3 


0.2 


0.1 


(ii) 


w 


8 


9 


10 


11 




P(W=w) 


0.4 


0.3 


0.2 


0.1 



(b) (i) P(X = x) = —, x = 1,2,3 

(ii) P(X = *) = -,* = 2, 3,6 
x 

A discrete random variable X is given by 

P(X = x) = k(x + l)forx = 2, 3, 4, 5,6. 

(a) Show that fc = 0.04. 

(b) FindE(X). [5 marks] 

The discrete random variable V has the probability distribution 
shown below and E( V) = 6.1. Find the value of and the median 
of V. 



V 


1 


2 


5 


8 


k 


P(V= v) 


0.2 


0.3 


0.1 


0.1 


0.3 



[6 marks] 
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A discrete random variable X has its probability mass function 
given by 

P(X = x) = k(x + 3 ), where x is 0, 1, 2, 3. 

(a) Show that k = — . 

18 

(b) Find the exact value of E(X). [6 marks] 

The probability distribution of a discrete random variable X is 
defined by: 

P(X = x) = fac(4 -*),* = 1,2,3 

(a) Find the value of x. 

(b) FindE(X). [6 marks] 

A fair six-sided die, with sides numbered 1, 1, 2, 2, 2, 5 is 
thrown. Find the mean and variance of the score. [6 marks] 

The table below shows the probability distribution of a discrete 
random variable X. 



X 

P(X = x) 



0 


1 


2 


3 


0.1 


p 


1 


0.2 



(a) Given that E(X) = 1.5, find the values ofp and q. 

(b) Calculate Var(X). [9 marks] 

A biased die with four faces is used in a game. A player pays 
5 counters to roll the die. The table below shows the possible 
scores on the die, the probability of each score and the number 
of counters the player wins for each score. 



Score 


i 


2 


3 


4 


Probability 


i 


1 


1 


1 




2 


4 


5 


20 


Number of counters 


4 


5 


15 


n 


player receives 











Find the value of n in order for the player to get an expected 
return of 3.25 counters per roll. [5 marks] 

In a game a player pays an entrance fee of $n. He then selects 
one number from 1, 2, 3 or 4 and rolls three standard dice. 

If his chosen number appears on all three dice he wins four 
times his entrance fee. 

If his number appears on exactly two of the dice he wins three 
times the entrance fee. 
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If his number appears on exactly one die he wins $1. 

If his number does not appear on any of the dice he wins 
nothing. 

(a) Copy and complete the probability table below 



Profit ($) 


—n 




2n 


3m 


Probability 




27 
64 







(b) The game organiser wants to make a profit over many plays 
of the game. Given that he must charge a whole number 
of cents, what is the minimum amount the organiser must 
charge? [10 marks] 



The binomial distribution 

Some discrete probability distributions are met so often that 
they have been given names and formal notation. One of the 
most important of these is the binomial distribution. There are 
several others, some of which you will meet in this chapter and 
some if you study the statistics option (Topic option 7). 

A binomial distribution occurs in situations where you have 
a set number of experiments (or 'trials ) each of which have 
two possible outcomes. The number of trials is usually 
denoted n. One outcome is conventionally called a success 5 
and the other a 'failure'. The probability of success is denoted p. 
If the probability of success in a trial is constant, and trials are 
conducted independently of each other, then the number of 
successes can be modelled using the binomial distribution. 

The symbol ~ is used to denote the concept 'follows this 
distribution, and one or two letter abbreviations are used for 
the standard distributions. So if a random variable X follows the 
binomial distribution with n trials and probability of success p, 
we would write X ~ B(n y p). 

So what is this distribution? Let us consider a specific example: 
suppose a die is rolled four times, what is the probability of 
getting exactly two fives? There are four trials so n = 4 and if we 

label a five as a success then p = -. The probability of a failure is 

5 6 
therefore — . 

6 

One way of getting two fives is if the first two times we get a five 

and the last two times we get something else. The probability 
115 5 

of this happening is — x — x - x -. But this is not the only way 
6 6 6 6 
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Counting the 
number of possible < ^ 
selections was dis- 
cussed in chapter 1. 




in which two fives can occur. The two fives may be first and 
third or second and fourth. In fact, we have to consider all the 
ways in which we pick two trials out of the four for the 5 to 



occur. This can happen in 



r 4 \ 

V 2 7 



ways. Each of them has the same 



probability as the first case. If X is the random variable 'number 
of 5s thrown when four dice are rolled' then we can say that: 



P(X = 2) = 



(4\ 



v 2 / 



The useful thing about identifying a binomial distribution is 
that you can then apply standard results without having to go 
through this argument every time. In particular, the expectation 
and variance of the binomial distribution can just be quoted 
using the formulae below. The proofs of these are beyond what 
is expected in the International Baccalaureate®, but if you are 
interested they are on Fill-in proof 25 'Expectation and variance 
of the binomial distribution on the CD-ROM. 

KEY POINT 23.4 

Standard results of the binomial distribution 



Statement of distribution 


X~B(tt,p) 


Probability formula 


?(X = x) = [^p*(l- P r 
for x = 0, 1, 2, n 


Expectation (E(X)) 


np 


Variance (Var(X)) 


np(l - p) 



(Note: in the Formula booklet, the expectaion is referred 
to as the mean) 



Worked example 23.6 



Rohir has a 30% chance of correctly answering a multiple -choice question. In a test there are 
ten questions. 

(a) What is the probability that Rohir gets exactly four of them correct? Give your answer to 
five significant figures. 

(b) What is the probability that Rohir gets at least one correct in the first five questions? 

(c) Suggest which requirements for a binomial distribution might not be satisfied in 
this situation? 
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continued . . . 

Define the random variable if not ## 
already defined in the question 

Give the probability distribution, ## 
checking that the conditions 
are met 

Express the formula for the ## 
probability required, and 
calculate the answer 

Define the random variable if not ## 
already defined in the question 

Give the probability distribution' 

Write down the probability •* 
required 

We are interested in X>1, which 
means that X = 1 , 2, 3, 4 or 5. 
Remember that a quicker way 
to do the calculation is to find 
1-P(X<1) 

Express the formula for the« # 
probability required, and 
calculate the answer 



Consider the requirements for the ## 
distribution 



Identify a requirement which is» # 
failed in this context: there are 
two outcomes, and trials are 
independent (answering one 
question does not make it 
easier or harder to answer 
another) 



(a) let X be the number of correct anewere in the first 
ten questions 



X ~ 0(10,0.3) 



10 



P(X = 4) = (0.3) (0.7) =0.20012 (55F) 



(b) let Y be the number of correct anewere in the 
first five questions 



Y ~ 0(5,0.3) 



?{X > 1) = 1 - P(X = 0) 



=1- 



0 



0.3° 0.7 5 



= 0.632 (3SF) 



(c) binomial requires: 

• two outcomes at each trial 

• constant probability of success in each trial 

• trial results independent of each other 



All questions are not of the same difficulty, so there 
might not be a constant probability of success. 
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If you need to find a probability of a range of successes, you 
could in theory add up the probabilities of individual outcomes. 
This can be very time consuming, so your calculator has a 
function giving the probability of getting up to and including 
any number of successes. 



EXAM HIM 

Most GDCs can calculate binomial probabilities 
automatically given n and p, see Calculator sheet 1 3 on 
the CD-ROM. But you may also be tested on applying the 
formula, which is given in the Formula booklet. 



Worked example 23.7 






1 


Random variable X has distribution B(15, 0.6). Find P(5 < X < 10). 


The calculator can give us ## 
probabilities of the form 

P(X<k) 


X ~ 3(15,0.6) 

P(5 < X < 10) = P(X < 10) - P(X < 5) 

= 0.78-27 - 0.0333 

= 0.749 (3SF) (from GDC) 


J 





EXAM HINT 

Even when you are using a calculator to find 
probabilities, you should still use correct mathematical 
notation (not calculator notation) in your answer. You 
do not need to explain how you did things on the 
calculator - just state the distribution you used, the 
probabilities calculated, and give the answer (usually to 
3 significant figures). 



Exercise 23C 



Remember to round your answer to three significant figures 
when using the calculator. 

1. The random variable X has a binomial distribution with 
n = 8 andp = 0.2. Calculate: 

(a) (i) P(X = 3) (ii) P(X = 4) 
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(b) (i 


) P(X<3) 


(ii) P(X<2) 


(c) (i 


) P(^>3) 


(ii) P(X>4) 


(d) (i 


) P(*<5) 


(ii) P(X<3) 


(e) (i 


) P(^^3) 


(ii) P(X>1) 


(f) (i 


) P(3<X<6) 


(ii) P(1<X<4) 



2. Given that Y ~ B(5, 0.5), find the exact value of: 



(a) (i 


) P(r = i) 


(ii) P(Y = 0) 


(b) (i 


) p(y>i) 


(ii) P(Y<1) 


(c) (i 


) P(7>4) 


(ii) P(Y<3) 



3. Find the mean and standard deviation of each of the 
following variables: 

(a) (i) y~B^100, (ii) X~B^16, £] 

(b) (i) X~B(l5, 0.3) (ii) Y~B(20, 0.35) 

(c) (i) Z-B^i-l, (ii) X~B^w, 

4. (a) Jake beats Marco at chess in 70% of their games. 

Assuming that this probability is constant and that the 
results of games are independent of each other, what is 
the probability that Jake will beat Marco in at least 16 of 
their next 20 games? 

(b) On a television channel, the news is shown at the same 
time each day; the probability that Salia watches the 
news on a given day is 0.35. Calculate the probability 
that on 5 consecutive days she watches the news on 
exactly 3 days. 

(c) Sandy is playing a computer game and needs to 
accomplish a difficult task at least three times in five 
attempts in order to pass the level. There is a 1 in 2 
chance that he accomplishes the task each time he tries, 
unaffected by how he has done before. What is the 
probability that he will pass to the next level? 

15% of students at a large school travel by bus. A random 
sample of 20 students is taken. 

(a) Explain why the number of students in the sample who 
travel by bus is only approximately a binomial distribution. 
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(b) Use the binomial distribution to estimate the 
probability that exactly five of the students 
travel by bus. [3 marks] 

} A coin is biased so that when it is tossed the probability of 
2 

obtaining heads is - . The coin is tossed 4050 times. Let X 
& 3 

be the number of heads obtained. Find: 

(a) the mean of X 

(b) the standard deviation ofX. [3 marks] 

A biology test consists of eight multiple -choice questions. 
Each question has four answers, only one of which is 
correct. At least five correct answers are required to pass 
the test. Sheila does not know the answers to any of the 
questions, so answers each question at random. 

(a) What is the probability that Sheila answers exactly five 
questions correctly? 

(b) What is the expected number of correct answers 
Sheila will give? 

(c) What is the standard deviation in the number of 
correct answers Sheila will give? 

(d) What is the probability that Sheila manages to pass the 
test? [7 marks] 

0.8% of people in the country have a particular cold virus 
at any time. On a single day, a doctor sees 80 patients. 

(a) What is the probability that exactly 2 of them have the 
virus? 

(b) What is the probability that 3 or more of them have 
the virus? 

(c) State an assumption you have made in these 
calculations. [5 marks] 

Given that Y ~ B(12,0.4): 

(a) Find the expected mean of Y. 

(b) Find the mode of Y [3 marks] 

On a fair die, which is more likely: rolling 3 sixes in 

4 throws or rolling a five or a six in 5 out of 6 throws? [6 marks] 
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Over a one month period, Ava and Sven play a total 

of x games of tennis. The probability that Ava wins 

any game is 0.4. The result of each game played is 

independent of any other game played. Let X denote 

the number of games won by Ava over a one month period. 

(a) Find an expression for P(X = 2) in terms of n. 

(b) If the probability that Ava wins two games is 0.121 
correct to three decimal places, find the value of n. 

[5 marks] 

A coin is biased so that the probability of it showing 
tails is p. The coin is tossed n times. Let X be a random 
variable representing the number of tails. It is known 
that the mean of X is 19.5 and the variance is 6.825. 
Find the values of n and p. [5 marks] 

A die is biased so that the probability of rolling a 6 is p. 
If the probability of rolling 2 sixes in 12 throws is 0.283 
(to three significant figures), find the possible values of 
p correct to two decimal places. [5 marks] 

In an experiment, a trial is repeated n times. The trials 
are independent and the probability p of success in 
each trial is constant. Let X be the number of successes 
in the n trials. The mean of X is 12 and the standard 
deviation is 2. Find n and p. [5 marks] 

X is a binomial random variable, where the number of 
trials is 4 and the probability of success of each trial is p. 
Find the possible values ofp if P(X = 3) = 0.3087. [5 marks] 

X is a binomial random variable, where the number of 
trials is 4 and the probability of success of each trial is 

96 

p. Find the possible values of p if P(X = 2) = . [6 marks] 

625 



Question 1 0 is the problem which was posed to Pierre de Fermat in 1 654 by a 
d£*f£% professional gambler who could not understand why he was losing. It inspired Fermat 
(with the assistance of Pascal) to set up probability as a rigorous mathematical 
discipline. 
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3 The Poisson distribution 

When you are waiting for a bus there are at any given moment 
two possible outcomes - it either arrives or it does not. We can 
try modelling this situation using a binomial distribution, but it 
is not clear what an individual trial is. Instead we have a rate of 
success - the number of buses that arrive in a fixed time period. 

There are many situations in which we know the rate of 
events within a given space or time, in contexts ranging from 
commercial (such as the number of calls through a telephone 
exchange per minute) to biological (such as the number of 
clover plants seen per square metre in a pasture). Where the 
events occur singly (one at a time) and can be considered 
independent of each other (so that the probability of each event 
is not affected by what has already happened), the number of 
events in a fixed space or time interval can be modelled using 
Poisson distribution. This distribution is fully defined once we 
know the rate of success, which is conventionally called m. 

KEY POINT 23.5 

Standard results of the Poisson distribution 



Statement of 
distribution 


X - Po(m) 


Probability formula 


P(X = x)= for x = 0,l,2, ... 


Expectation E(X) 


m 


Variance Var (X) 


m 



(Note: in the Formula booklet, E(X) is called the mean) 



Worked example 23.8 



Recordable accidents occur in a factory at an average rate of 7 every year, independently of 
each other. Find the probability that in a given year exactly 3 recordable accidents occurred. 



Define the random variable •* 
Give the probability distribution •* 

Write down the probability •* 
required, and calculate the 
answer 



Let X be the number of acc\der\te in a year 
X ~ Po(7) 



P(X = 3) = 



e- 7 7 3 



3! 

= 0.521 (3SF) 
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The Poisson distribution is scaleable. If the number of butterflies 
seen on a flower in 10 minutes follows a Poisson distribution 
with mean (expectation) m, then the number of butterflies 
seen on a flower in 20 minutes follows a Poisson distribution 
with mean 2m; the number of butterflies seen on a flower in 5 

TYl 

minutes follows a Poisson distribution with mean — . 



EXAM HINT 

See Calculator sheet 1 3 on the CD-ROM. Your GDC 
can calculate Poisson probabilities and cumulative 
probabilities, but you may be explicitly asked to use the 
formula. Remember to round your answers to 3SF. 




r 

Worked example 23.9 






If there are an average of 12 buses per hour arriving at a bus stop, find the probability that there 


are more than 6 buses in 30 minutes. 






Define the random variable •* 


] 

Let X be the number of buses in 30 minutes 


Give the probability distribution •* 


X ~ Po(6) 


Write down the probability required. ## 
To use the calculator we must relate it 

to P(X < k) 


P(X>6) = 1-P(X<6) 

= 0.161 (3SF) from GDC 

- . 



Exercise 23D 

1. State the distribution of the variable in each of the following 
situations: 

(a) Cars pass under a motorway bridge at an average rate of 
6 per 10 second period. 

(i) the number of cars passing under the bridge in 
1 minute 

(ii) the number of cars passing under the bridge in 
15 seconds 

(b) Leaks occur in water pipes at an average rate of 12 per 
kilometre. 

(i) the number of leaks in 200 m 

(ii) the number of leaks in 10 km 
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(c) A widget machine manufactures on average 96 functional 
widgets out of 100. 

(i) the number of faulty widgets in a sample of 10 

(ii) the number of functioning widgets in sample of 20 

(d) 12 worms are found on average in a 1 m 2 area of a garden. 

(i) the number of worms found in a 0.3 m 2 area 

(ii) the number of worms found in a 2 m by 2 m area 



2. Calculate the following probabilities: 

(a) IfX~Po(2) 
(i) P(X = 3) 

(b) If y~Po(1.4) 
(i) P(r<3) 

(c) IfZ~Po(7.9) 
(i) P(Z<6) 

(d) If X~Po(5.9) 
(i) P(X>3) 

(e) IfX~Po(ll.4) 
(i) P(8<X<11) 



(h) P(X = 1) 
(ii) P(7<1) 
(ii) P(Z<10) 
(ii) P(X>1) 
(ii) P(8<X<12) 



3. A random variable X follows a Poisson distribution with mean 
1.7. Copy and complete the following table of probabilities, 
giving results to 3 significant figures: 



x 

P(X = x) 



0 


1 


2 


3 


4 


>4 


0.183 













From a particular observatory, shooting stars are observed 
in the night sky at an average rate of one every 5 minutes. 
Assuming that this rate is constant and that shooting stars occur 
(and are observed) independently of each other, what is the 
probability that more than 20 are seen over a period of 
1 hour? [4 marks] 



When examining blood from a healthy individual under a 
microscope, a haematologist knows that on average he should 
see 4 white blood cells in each high power field. Find the 
probability that blood from a healthy individual will show: 

(a) 7 white blood cells in a single high power field 

(b) a total of 28 white blood cells in 6 high power fields, selected 
independently. [5 marks] 
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Salah is sowing flower seeds in his garden. He scatters seeds 
randomly so that the number of seeds falling on any particular 
region is a random variable with a Poisson distribution, with 
mean value proportional to the area. He will sow fifty thousand 
seeds over an area of 2 m 2 . 

(a) Calculate the expected number of seeds falling on a 1 cm 2 
region. 

(b) Calculate the probability that a given 1 cm 2 area receives 
no seeds. [4 marks] 

A wire manufacturer is looking for flaws. Experience suggests 
that there are on average 1.8 flaws per metre in the wire. 

(a) Determine the probability that there is exactly 1 flaw in 

1 metre of the wire. 

(b) Determine the probability that there is at least one flaw in 

2 metres of the wire. [5 marks] 

The random variable X has a Poisson distribution with mean 5. 
Calculate: 

(a) P(X<5) 

(b) P(3<X<5) 

(c) P(X*4) 

(d) P(3<X<5|X<5) [8 marks] 

Patients arrive at random at an emergency room in a hospital 
at the rate of 14 per hour throughout the day. 

(a) Find the probability that exactly 4 patients will arrive at the 
emergency room between 18:00 and 18:15. 

(b) Given that fewer than 15 patients arrive in one hour, 

find the probability that more than 12 arrive. [6 marks] 

The number of eagles observed in a forest in one day follows a 
Poisson distribution with mean 1.4. 

(a) Find the probability that more than three eagles will be 
observed on a given day. 

(b) Given that at least one eagle is observed on a particular 
day, find the probability that exactly two eagles are seen 
that day. [ 6 marks ] 

The random variable X follows a Poisson distribution. Given 
that P(X > 1) = 0.4, find: 

(a) the mean of the distribution 

(b) P(X>2). [5 marks] 
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The random variable X is Poisson distributed with mean m and 
satisfies P(X = 3) = P(X < 3). 

(a) Find the value of m, correct to four decimal places. 

(b) For this value of m evaluate P(2 < X < 4). [6 marks] 

Let X be a random variable with a Poisson distribution, such 
that P(X > 2) = 0.3. Find P(X < 2). [5 marks] 

The number of emails you receive per day follows a Poisson 
distribution with mean 6. Let D be the number of emails 
received in one day and W the number of emails received in a 
week. 

(a) Calculate P(D = 6) and P(W r = 42). 

(b) Find the probability that you receive 6 emails every day in 
a seven-day week. 

(c) Explain why this is not the same as P( W= 42). [8 marks] 

The number of mistakes a teacher makes whilst marking 
homework has a Poisson distribution with a mean of 1.6 errors 
per piece of homework. 

(a) Find the probability that there are at least two marking 
errors in a randomly chosen piece of homework. 

(b) Find the most likely number of marking errors occurring 
in a piece of homework. Justify your answer. 

(c) Find the probability that in a class of 12 pupils fewer than 
half of them have errors in their marking. [9 marks] 

A car company has two limousines that it hires out by the day. 
The number of requests per day has a Poisson distribution with 
mean 1.3 requests per day. 

(a) Find the probability that neither limousine is hired. 

(b) Find the probability that some requests have to be denied. 

(c) If each limousine is to be equally used, on how many days 
in a period of 365 days would you expect a particular 
limousine to be in use? [8 marks] 

A shop has 4 copies of the book 'Ballroom Dancing delivered 
each week. The demand for the book follows a Poisson 
distribution with mean 3.2 requests per week. 

(a) Calculate the probability that the shop cannot meet the 
demand in a given week. 

(b) Find the most probable number of books sold in one week. 
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(c) Find the expected number of books sold in one week. 

(d) Determine the smallest number of copies of the book that 
should be ordered each week to ensure that the demand is 
met with a probability of at least 98%. [8 marks] 

The random variable X follows Poisson distribution with mean 
A. If P (X = 2) = P (X = 0) + P(X = 1), find the exact value of A. 

[4 marks] 

The random variable X follows a Poisson distribution with mean A. 

(a) Show that p(7 = y + 2) = - -V(Y = y). 

(y + l)(y + 2) 

(b) Given that A = 6^2, find the value of y such that 

P (Y = y + 2) = P(7 = y). [4 marks] 



Summary 



A random variable is a quantity whose value depends on chance. A list of all possible 
outcomes and their associated probabilities is called a probability distribution or probability 
mass function. 

The total of all the probabilities of a probability distribution must always equal 1. 

Even though the outcome of any one observation of a random variable is impossible to 
predict with any certainty, the expectation (of the mean) and variance of observations can be 
predicted quite accurately, using: 

E(X) = ^>P(X = x) 

X 

Var(X) = E(X 2 )-[E(x)] 2 

If there is a fixed number of trials (each with two possible outcomes) with constant and 
independent probability of success in each trial then the number of successes follows a 
Binomial distribution: X ~ B(n,p) . 

If events occur singly, independently and at a constant rate, then the number of events in a 
given period follows a Poisson distribution: X ~ Po(m), where m is the rate of success. 

Once the distribution has been identified then probabilities and statistics for the distribution 
can be immediately quoted: 



Distribution 



Binomial 



Poisson 



Notation 


P(X = x) 


E(X) 


Var(X) 


X~B(n,p) 


(n\ 

p x (l-p) 


np 


np{\-p) 


X~Po(m) 


e m m x 
x\ 


m 


m 
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Worked example 24.1 



A continuous random variable has a pdf: 
r kx 2 0<x<l 



/(*)=■ 



0 otherwise 



(a) Find the value of k. 

(b) Find the probability of x being between 0.2 and 0.6. 




Total area is 1. Area is only found between* 

0 and 1 



Probability X lies in [a, b] is J f(x)dx 



(a) ] = j\x 2 dx 



kx 5 



_k_ 
<=> fc = 3 



(b) P(0.2<X<0.6) = J 3x 2 dx 




Exercise 24A 

1. For each of these distributions, find the possible values of the 
unknown parameter k: 

(a) (i) f(x) = kx 3 , 2<x<3 (ii) f(x) = ky[x, Kx<4 

(b) (i) f(x) = x 2 +k,-Kx<2 (ii) f(x) = 3x + k, -2<x<3 

(c) (i) f( x ) = e kx , 0<x<2 (ii) f(x) = sinkx, 0 < x < n 

(d) (i) /(*) = -J_, 0<x<l (ii) /(*) = -!- 0<%<1 

(x + fc) x + k 

(e) (i) /(.*) = x 3 , 0<x<A; (ii) /(x) = 2x-l, l<x<A: 

(f) (i) ^( X ) = J_ > /c < x < /c + 1 (ii) f(x) = x 2 ,k<x<2k 

l + x 

(g) (i) f(x) = kx 2 , 0<x<k (ii) f(x) = x + k, 0<x<k 
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(h) (i) f(x) = ke~ x2 , 3 < x < 8 (ii) f(x) = fcsin yfx, n<x<n 2 

(i) (i) f(x) = — , \<x<k (ii) /(*) = —?=, < x < 1 



2Vx ' 



2. (a) If/(x) = 



2-2x 0<x<l 



0 otherwise 
(i) Find P(0.3< X< 0.9). (ii) Find P(0< X< 0.5). 

K 



(b) If /(*) = < 

(i) FindP 



cosx 0<x< 



0 



otherwise 



1 



(ii) FindP 



6 



(c) If /(*) = 

(i) FindP(X>5) 



1< x< 10 

xlnlO 

0 otherwise 



(ii) Find P(X < 3). 



3. (a) If /(*) = 



2x 0< x< 1 



0 otherwise 

(i) Find a if P(X < a) = 0.4 . (ii) Find b if P(X < b) = 0.9 

- 0<x<8 
8 

0 otherwise 

(i) Find a if P(X > a) = 0.9. (ii) Find b if P(X > b) = 0.5. 



(b) If/(x) = 



(c) If/(x) = 



— 2 < x < 6 
16 

0 otherwise 



(i) Find a if P(2+a< X< 6-a) = 0.8. 

(ii) Find bif (b< X< b+l)= 0.25. 

A model predicts that the angle, G, by which an alpha 
particle is deflected by a nucleus is modelled by: 



/(*) = 



[kg 2 0 < g < n 
0 otherwise 



(a) Find the value of the constant k. 

(b) 10 000 alpha particles are fired at a nucleus. If the model 
is correct, estimate the number of alpha 



particles deflected by less than 



71 



[6 marks] 
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A random variable Y has distribution: 

f(y) = \ 

J v 7 [0 otherwise 

Find the exact value of P(Y > 2). [4 marks] 

J If the continuous random variable X has a probability density 

71 



/(*) = 



sec 2 x 0 < x < 
4 



0 otherwise 

find the interquartile range of X. [6 marks] 

lff(x) = 



— l<x<e 
x 

0 otherwise 



(a) Find b in terms of k if P( b < X < b 2 )= k. 

(b) Find a in terms of k if P(2- a< X < 2+ a) = /c. [7 marks] 



□ lf/(x) = 



[e* k<x<2k 
1 0 otherwise 



3 Expectation and variance of 
continuous random variables 

The expressions for expectation and variance of continuous random 
variables all involve integration. 

KEY POINT 24.3 

Expectation and variance of continuous random variables: 
E(X)= P xf(x)dx 
E(X 2 ) = fx 2 f(x)dx 
Var(X) = E(X 2 )-[E(X)f 



^1* You may have 
- noticed that the 



Note: The formulae in the Formula booklet are set out slightly 
differently. 
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expressions for 
E(X) and Var(X) 
look similar to those for 
discrete random variables, 
but with integration signs 
instead of summation signs. 
This is because there is 
a link between sums and 
integrals, which you met in 
chapter 17. You will explore 
this further if you study 
chapter 28 in the Calculus 
option (Topic option 9). 
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Worked example 24.2 



If a continuous random variable has pdf 



/(*)= 

find E(X) and the standard deviation of X 



x(2-x) 0<x<2, 
otherwise 



We can do the integration on the calculator. * 
If you need reminding how, See Calcultor skills 
sheet 10 on the CD-ROM 



To find standard deviation we must first * 
find Var(X) which requires us to find EfX 2 ) 




f2 3 

E(X) = J o *x— x(2-x)dx 



2 

4 

3 j"V(2-x)dx 



4J0 

= 1 (from GDC) 



:(X z ) = f o X z x-x(2-x)dx 



3 

o 4' 

3 f2 

= — x 5 (2-x)dx 

4 Jo v 7 

= 1.2 (from GDC) 

■\ Var(X) = E(X 2 )-E(X) 2 

= 1.2-1 2 

= 0.2 
standard deviation 





Tfe maximum 
value off(x) is not 
necessarily where 

<^df 

dx 
I6H 



■ = 0, see Section 



It is also possible to find the median and mode for a continuous 
distribution. 

The defining feature of the median is that half of the data should 
be below this value and half above. The mode is the most likely 
value. We can interpret this in terms of probability. 

KEY POINT 24.4 

The median, m, satisfies 

j_J(x)dx = - 
The mode is the value of x at the maximum value off(x). 
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Worked example 24.3 



lff(x) = 



20 
0 



(4.x 2 -x 3 ) 0<x<2 



find the median and mode of X. 



otherwise 



Probability of being below the median is — 



This is a quartic equation without any easy* 
substitution. Time to use the calculator 



For the mode check for a local maximum 1 



\ m — (4x 2 -x 3 )dx = - 
J o 20 2 

m 5 3m 4 1 

<=> 



5 bO 2 




1.52 5.24 
From GDC: m = 1.52 or 5.24 
However 0 < m < 2 therefore median 
= 1.52 



df 6x 9x 2 _ 
dx ~ 5 20 ~ 

= — (0-3*) 
20 

<^=>x = Oorx = — 
3 




From the graph of f(x) it is clear that 
x- — 

corresponds to a maximum, so the 

mode is — 

3 
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Exercise 24B 



1. Find E(X), the median of X, the mode of X and Var(X) if X 
has the given probability density function: 

(a) (i) f(x) = 2-2x 0<x<l (ii) f(x) = - 0<x< 

8 

1 2 

(b) (i) f(x)= Kx<10(ii) /(*) = — Kx<2 



x lnlO 



71 



(c) (i) f(x) = cosx 0<x<- (ii) f(x) = e x 0<x<ln2 

(d) (i) /(x) = 4 *>1 (ii) /(*) = 4 



2. (a) Given that E(X) = 1.1, find A: if: 
1 



(i) /(*) = 



1 *c <c 

xlnfc (ii)/(x) = 

0 otherwise 



x > 1 

1 < X < °° 

otherwise 



(b) Given that E(X) = 3, find k if: 



(i) /(*) = 

(ii) /(*) = ■ 



1 



1 



otherwise 
0<jc<(e-l) 



x + k 
0 otherwise 



The continuous random variable X has pdf: 



/(*) = 



— Ux 2 -x 3 ) 0<x<2 
20 v ; 



0 otherwise 

(a) Find the expected mean of X. 

(b) Find the mode of X. 



[6 marks] 



/(*) = ■ 



A continuous random variable B has pdf: 
\ab 2 3<fc<10 
[0 otherwise 

(a) Find the value of the constant a. 

(b) FindE(B). 



[7 marks] 



A function /is given by: 

, x f ke~ kx y > 0 

f(y) = \ 

7 [0 otherwise 

(a) Show that/is a probability density function 

(b) A random variable Y has distribution given byf(x). 

Find E(Y) in terms of k. [10 marks] 
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Y is a continuous random variable with probability density 
function: 

ay 2 —k< y <k 



f(y)= 



0 otherwise 



(a) Show that a - — — . 

2k 3 

(b) Given that Var(Y) = 5 find the exact value of k. [11 marks] 



Given that f(x) = 



e 2 ,x < 



is a probability 



density function find E(X) and prove that Var(X) = 1. [9 marks] 



The normal distribution 

There are many situations where a variable is most likely to 
be close to its average value, and values further away from the 
average become increasingly unlikely Many such situations can 
be modelled using the normal distribution. 

All that is needed to describe this distribution is its mean (|Ll) 
and variance (a 2 ). If a variable follows this distribution we use 
the notation X ~ N(ji, a 2 ). 

The probability density function (pdf ) for the normal 



distribution is quite complicated: 

/(*)= 



(*-ii) 2 

2a 2 



2na 



This function cannot be integrated in terms of other 
well-known functions, but your calculator can find 
approximate probabilities. 

See Calculator sheet 14 on the CD-ROM. 



# 



You may find it helpful to sketch a diagram to get a visual 
representation of the probability you are trying to find. 



N(5,1.2) 




5 6 
P(X < 6) 
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i£ Historically, 

# J| cumulative 

* probabilities for the 
normal distribution 
were recorded in tables 
and these are still used if 
you don't have a graphical 
calculator. As there cannot 
be separate tables for every 
possible different jj and 
a, all values needed to be 
converted into a Z-score 
described later. 
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The diagrams can also provide a useful check, to see whether you 
should expect the probability to be smaller or greater than 0.5. 




P (X > 110) P (18.5 < X< 21) 



Worked example 24.4 



The average height of people in a town is 170 cm with standard deviation of 10 cm. What is the 
probability that a randomly selected resident: 

(a) is less than 165 cm tall? 

(b) is between 180 cm and 190 cm tall? 

(c) is over 176 cm tall? 



State the distribution used 



State the probability to be found* 
and use the calculator 



State the probability to be found* 
and use the calculator 



X is the crv 'height of a town resident' eo 
X-MiVOAOO) 

[a) p(X<165) 

X- N(170,100) 




165 170 

P(X < 165) = 0.309(3SF) (from GDC) 

(b) p(1<30 < X < 190) 

X- N(170JOO) 




170 \&0 190 
?(\b0 <X< 190) = 0.136(35F) (from GDC) 



i 
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continued . . 



State the probability to be found 
and use the calculator 



X~ N(170,1<%>) 




170 176 
P (X > 176) 

p(X > 176) = 0.274(35F) (from GDC) 




If a normally distributed random variable has mean 120, 
should a value of 150 be considered unusually large? The 
answer depends on how spread out the variable is, which is 
measured by its standard deviation. If the standard deviation is 
30 then a value around 150 will be quite common; however, if 
the standard deviation were 5 then 150 would be very unusual. 

It turns out that the probability of a normally distributed 
random variable being less than a given value (P(X < x), called 
the cumulative probability) depends only on the number of 
standard deviations x is from the mean. This is called 
the Z-score. 

KEY POINT 24.5 

For X ~ N(|H,a 2 )> the Z-score measures the number of 
standard deviations of x above the mean. 



z - ■ 



x-\x 
a 



(a negative Z-score means x is below the mean) 



In the real world, 
d£**£% there is always a 
possibility that a 
result may have 
occurred by random 
chance. Supplementary 
sheet 1 2 'Significant 
discoveries 7 explores how 
unlikely a result has to be 
before we accept it was 
not a fluke, which is often 
stated in terms of the 



z-score. 



Worked example 24.5 


Given that X ~ N(15, 6.25): 






(a) How many standard deviations is x = 16.1 away from the mean? 

(b) Find the value of X which is 1.2 standard deviations below the mean. 




The number of standard deviations away from* 
the mean is measured by the Z-score 


0) z = 




> 
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continued , 



6.25 is the variance 1 



Values below the mean have a negative Z-score* 



■■0A4 



a = 4623 = 2.5 

16.1-15 
.". z = 

2.5 

16.1 \e 0A4 etandard deviations away 
from the mean. 



(b) 



-1.2 = 

X-15: 



z = -1.2 
x-15 



2.5 



-3 

M2 



Before graphical 
Jp*f£% calculators 

existed (which 
wasn't so long 
ago!) people used tables 
showing cumulative 
probabilities of the 
standard normal 
distribution. Because of 
their importance they were 
given special notation: 
0(z) = P(Z<z). 
Although you do not have 
to use this notation, you 
should understand what it 
means. 



If we are given a random variable X ~ N(|Ll, o 2 ) we can create 
a new random variable Z which takes the values equal to the 
Z-scores of the values of X. In other words, for each x there is a 

X — LL 

corresponding z = . This is called the standardised value. 

a 

It turns out that, whatever the original mean and standard 
deviation of X, this new random variable always has normal 
distribution with mean 0 and variance 1, called the standard 
normal distribution: Z ~ N(0, l). This is an extremely 
important property of the normal distribution which needs to 
be used in situations when the mean and standard deviation of 
X are not known (see next section). 

KEY POINT 24.6 

The probabilities of X and Z are related by 



P(X<x) = P 



Z< 



x-\i 



Worked example 24.6 


Let X ~ N(6, 0.5 2 ). Write the following in terms of probabilities of Z: 


(a) P(X< 6.1) 




(b) P(5< X< 7) 




(c) P(X>6.5) 




We are given that x = 6. 1 so we can ## 


(a) P(X < 6.1) = V\Z < ^-J = V{Z < 02) 


calculate z 


>■ 
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continued ... j 

The relationship between Xand Z* 
above is stated for probabilities of the 
form P(X < k), so convert to that form 

first 



(b) P(5 <X <7) = P(X <7)-P(X <5) 

f 7-6} ( 5-6 
= P Z< -P Z< 



0.5 7 ^ 0.5 
= P(Z < 2) - P(Z < -2) = P(-2 < Z < 2) 

(c) P(X>6.5) = 1-P(X<6.5) 
f 6.5-6^ 
= 1 " P l Z -"^J = 1 " P(Z - 1) 
= P(Z > 1) 



You can see from the examples above that you don t actually 
have to convert probabilities into the form P(X <k) every time; 
simply replace the x values by the corresponding z scores. 



Exercise 24C 

Find the following probabilities: 

(a) IfX~N(20,100), 

(i) P(X<32) (ii) P(X<12) 

(b) Ify~N(4.8,1.44), 

(i) P(7 >5.1) (ii) P(7>3.4) 

(c) If £~ AT (17,2) 

(i) P(16 < R < 20) (ii) P(17.4 < R < 18.2) 

(d) If Q has a normal distribution with mean 12 and 
standard deviation 3: 

(i) P(Q>9.4) (ii) P(Q<14) 

(e) If F has a normal distribution with mean 100 and 
standard deviation 25: 

(i) P(|F — 100| < 15) (ii) P(|F-100|>10) 

Find the Z-score corresponding to the given value of X: 

(a) (i) X~iV(l2,2 2 ),;t = 13 (ii) X ~ iV(38, 7 2 ),x = 45 

(b) (i) X~iV(20,9U = 15 (ii) X ~ N(l62, 25), x = 160 

Given that X ~ JV(l6, 2.5 2 ), write the following in terms of 
probabilities of the standard normal variable: 

(a) (i) P(X<20) (ii) P(X<19.2) 

(b) (i) P(X>14.3) (ii) P(X>8.6) 

(c) (i) P(12.5<X<16.5) (ii) P(10.1 < X < 15.5) 
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It is found that the lifespan of a certain brand of laptop 
batteries follows normal distribution with mean 16 hours 
and standard deviation 5 hours. A particular battery has a 
lifespan of 10.2 hours. 

(a) How many standard deviations below the mean is this? 

(b) What is the probability that a randomly chosen laptop 
battery has a lifespan shorter than this? [6 marks] 

When Ali competes in long-jump competitions, the 
lengths of his jumps are normally distributed with mean 
5.2 m and standard deviation 0.7 m. 

(a) What is the probability that Ali will record a jump 
between 5 m and 5.5 m? 

(b) Ali needs to jump 6 m to qualify for the school team. 

(i) What is the probability that he will qualify with a 
single jump? 

(ii) If he is allowed three jumps, what is the probability 
that he will qualify for the school team? [7 marks] 



You saw in chapter 
<^22 that n means 
intersection. 



Weights of a species of cat have a normal distribution with mean 
16 kg and variance 16 kg 2 . In a sample of 2000 such cats, estimate 
the number which will have a weight above 1 3 kg. [6 marks ] 

IfD~N(250, 400), find: 

(a) P(D> 265nD< 280) 

(b) P(D>265|D<280) 

(c) P(D<242nD>256) [6 marks] 



IfQ~N(4,160), find: 

(a) P(5<|Q|) 

(b) P(Q>5|5<|Q|) 



[6 marks] 



J The weights of apples are normally distributed with mean 
weight 150g and standard deviation 25 g. Supermarkets 
classify apples as medium if they are between 120 g and 170g. 

(a) What proportion of apples are medium? 

(b) In a bag of 10 apples what is the probability that there 

are at least 8 medium apples? [6 marks] 



The wingspans of a species of pigeon are normally 
distributed with mean length 60 cm and standard 
deviation 6 cm. A pigeon is chosen at random. 
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(a) Find the probability that its wingspan is greater than 50 cm. 

(b) Given that its length is greater than 50 cm, find the 
probability that a wingspan is greater than 55 cm. [6 marks] 



Grains of sand are believed to have a normal distribution 
with mean 2 mm and variance 0.25 mm 2 . 

(a) Find the probability that a randomly chosen grain of 
sand is larger than 1.5 mm. 

(b) The sand is passed through a filter which blocks grains 
wider than 2.5 mm. The sand that passes through 

the filter is examined. What is the probability that a 

randomly chosen grain of filtered sand is larger than 

1.5 mm? [6 marks] 

The amount of paracetamol per tablet is believed to be 
normally distributed with mean 500 mg and standard 
deviation 160 mg. A dose of less than 300 mg is ineffective 
in dealing with toothache. In a trial of 20 people treated 
for toothache with a single tablet, what is the probability 
that 2 or more of them have less than the effective dose? 

[6 marks] 

A variable has a normal distribution with a mean that is 
7 times its standard deviation. What is the probability of 
the variable taking a value less than 5 times the standard 
deviation? [6 marks] 

If X~N(|a,o 2 ) and P(X< x)= k find P(X<2\i-x) in 
terms of k. [5 marks] 



3 Inverse normal distribution 

In Section C we saw how to find probabilities when we knew 
information about the variable. In real life it is often useful to work 
backwards from probabilities to estimate information about the 
data. This requires the inverse normal distribution. 

KEY POINT 24.7 



For a given value of probability p the inverse normal 
distribution gives the value of x such that P( X < x) = p. 



Remember 

the cumulative 

obability. 



must 
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You will need to use your GDC to work out the inverse normal 
distribution (see Calculator skills sheet 14, the section on 
'Finding the boundary on the CD-ROM). To work out P(X > x) 
you might need to do 1 - P(X < x). Note that many textbooks 
use the O(z) notation mentioned in the previous section 
to write inverse normal distribution: If P(X < x) = p, then 

a 



Worked example 24.7 



The size of mens feet is thought to be normally distributed with mean 22 cm and variance 

25 cm 2 . A shoe manufacturer wants only 5% of men to be unable to find shoes large enough for 

them. How big should their largest shoe be? 



Convert question into mathematical terms* 



Use inverse normal distribution.* 
We may have to convert into a probability of 

the form P(X<x) 



If X \e the crv 'length of a mane foot' 

then X~N(22,25) 

We want to find the value of x such 

that 

F(X>x) = 0.05 

P(X < x) = 1 -YiX > x) = 0.95 
=>x= 30.2 cm (from GDC) 
So their lavqeet ehoe must fit a foot 
30.2 cm long. 




J 



This 



, and 



use your 



ca 



One of the main applications of statistics is to determine 
parameters of the population given information about the data. 
But how can we use the normal distribution calculations if the 
mean or the standard deviation is unknown? This is where the 
standard normal distribution comes in useful; we can replace 
all the X values by their Z- scores, as they follow a known 
distribution, N (0, 1). 



Worked example 24.8 



The masses of gerbils are thought to be normally distributed. If 30% of gerbils have a mass of 
more than 65 g and 20% have a mass of less than 40 g, estimate the mean and the variance of 
the mass of a gerbil. 
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continued ... | 

Convert the information into * 
mathematical terms 



If you need all the probabilities * 
to be in the form P(X< k), 
convert the first one 

J* 

Use inverse normal distribution for 
Z (Z - N (0, 1 )) and relate it to the 
given X values 

Solve simultaneous equations ## 



If X is the crv 'maee of a gerbil' then X ~ N(|Li,a 2 ) 
P(X>65) = 0.3 

F(X<40) = 0.2 (1) 



F{X<65) = 0.7 



(2) 



-0342 



from (1) P(Z < z) = 0.2=> z = 4 ° ^ = - 

a 

05 — LL 

from (2) F(Z < z) = 0.7 => z = = 0.524 

(from GDC) G 



(4) "(3) 



40 - jll = -0.542a 
65 - |H = 0.524a 

25 = 1.366a 
=>a = 1c3.3^ 

.\ \i = 55,40 



(3) 
(4) 



J 



Exercise 24D 

1. (a) IfX~N(14,49),findxif: 

(i) P(X<x) = 0.8 (ii) P(X<x) = 0.46 

(b) If X~N(36.5,10),findxif: 

(i) P(X>x) = 0.9 (ii) P(X>x) = 0.4 

(c) If X~N(0,12), find x if: 

(i) P(|X|<0.5) (ii) P(|X|<0.8) 

2. (a) If X~N(n,4),findnif 

(i) P(X>4) = 0.8 (ii) P(X>9) = 0.2 

(b) If X ~ N(8, a 2 ) find a if 

(i) P(X<19) = 0.6 (ii) P(X<0) = 0.3 

3. If X ~ N(|i,, a 2 ) , find [i and o if: 

(a) (i) P(X > 7) = 0.8 and P(X < 6) = 0.1 

(ii) P(X>150) = 0.3 and P(X<120) = 0.4 

(b) (i) P(X > 0.1) = 0.4 and P(X > 0.6) = 0.25 
(ii) P(X > 700) = 0.8 and P(X > 400) = 0.99 



© Cambridge University Press 201 2 



24 Continuous distributions 785 



IQ tests are designed to have a mean of 100 and a standard 

deviation of 20. What IQ score is needed to be in the top 

2% of IQ scores? [5 marks] 



Rabbits masses are normally distributed with an average 
mass of 2.6 kg and a variance of 1.44 kg 2 . A vet decides that 
the top 20% of rabbits are obese. What is the minimum 
mass for an obese rabbit? [5 marks] 



J A manufacturer knows that his machines produce bolts 
whose diameters follow a normal distribution with 
standard deviation 0.02 cm. He takes a random sample of 
bolts and finds that 6% of them have diameter greater than 
2 cm. Find the mean diameter of the bolts. [6 marks] 



Q (a) 30% of sand from Playa Gauss falls through a sieve 
with gaps of 1 mm, but 90% passes through a sieve 
with gaps of 2 mm. Assuming that a grain of sand s 
diameter is normally distributed, estimate the mean 
and standard deviation of the sand grains. 

(b) 80% of sand from Playa Fermat falls through a sieve with 
gaps of 2 mm. 40% of this filtered sand passes through a 
sieve with gaps of 1 mm. Assuming that a grain of sands 
diameter is normally distributed, estimate the mean and 
standard deviation of the sand grains. [7 marks] 

The actual voltage of a brand of 9 V battery is thought 

to be normally distributed with standard deviation 

0.8 V and mean (9.2 - t) V where t is the time in hours 

that the battery has been used. When a battery's voltage 

drops below 7 V it can no longer power a lamp. A batch 

of batteries is found and only 10% can power the lamp. 

Assuming that the model is correct and that they were all 

used for the same amount of time, estimate for how long 

the batteries have been used. [7 marks] 



The times taken for students to complete a test are 
normally distributed with a mean of 32 minutes and 
standard deviation of 6 minutes. 

(a) Find the probability that a randomly chosen student 
completes the test in less than 35 minutes. 

(b) 90% of students complete the test in less than 
t minutes. Find the value of t. 

(c) A random sample of 8 students had their time for the 
test recorded. Find the probability that exactly 2 of 

786 Topic 5: Probability and statistics © Cambridge University Press 201 2 



these students completed the test in less than 

30 minutes. [7 marks] 



An old textbook says that the range of data can be estimated 
as 6 times the standard deviation. If the data is normally 
distributed what percentage of the data is within this range? 

[6 marks] 

A scientist noticed that 36% of temperature measurements 
were at least 4°C lower than the mean. Assuming that the 
measurements follow a normal distribution, estimate the 
standard deviation. [5 marks] 

For a normal distribution find the ratios: 

, x median 
(a) 



mean 



standard deviation 

(b) : [6 marks] 

inter-quartile range 

Evaluate <&- l (x) + <&- l (l-x). [3 marks] 

IEM A company makes a large number of steel links for chains. 
They know that the force required to break any individual 
link is modelled by a normal distribution with mean 
20 kN. The company tests chains consisting of 4 links. If 
any link breaks, the chain will break. A force of 18kN is 
applied to all of the chains and 30% break. 

(a) Estimate the probability of a single link breaking. 

(b) Hence estimate the standard deviation in the breaking 
strength of the links. [6 marks] 

Most calculators have a random number generator which 
generates random numbers distributed uniformly from 
0 to 1. How can you use these to form random numbers 
that could be drawn from a normal distribution? [4 marks] 



Summary 

• Because we group continuous data, the probability of a continuous random variable (crv) 
is discussed in terms of the probability of it being in a given range. To do this we integrate a 
probability density function such that the area under the curve/ (x) represents the probability. 
The probability of the crv falling between values a and b is: 

rb 

P(a<x<b)= f(x)dx 

J a 
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